Class 11 Assignment - Remote Sensing Image Classification

Lake Mead Change Over Time

Class 11 Concepts & Themes:

This week’s assignment will encompass the following concepts covered in Class 11 lecture & lab:

  • Landsat & Sentinel Satellite Programs

  • Multi-spectral Imagery

  • Band Combinations

  • Classification Methods - Supervised vs. Unsupervised

  • 4 essential satellite image resolutions

Class 11 Lab Materials:

Class 11 Readings:

The class 11 quiz (Quiz 9 - Satellite Imagery) will cover only content from the Essentials of Geographic Information Systems textbook as follows:

Class 11 Assignment: Lake Mead Extent Reduction - Detection through Supervised Classification

Assignment Introduction:

The following assignment utilizes a supervised land cover classification method to determine contemporary extent reduction at Lake Mead, a critical water resource for multiple counties and states in the western US. The area of concern is located at an intersection of Arizona and Nevada states, slightly east of Las Vegas. Here water storage within the reservoir is near capacity in 2000, severely dwindling in extent over the next two decades. While the reservoir is measured in terms of total capacity as a factor of elevation and extent, through remote sensing we can access the spatial extent of the waterbody over time. In this way, we can gauge the locations within the reservoir that a diminishing over time.

‘Bathtub’ rings at Lake Mead - Evidence of Severe Reservoir Water Loss

Through the assignment classification process, land cover change between the two original input images can be determined. For this analysis, 2 Landsat images from 2000/08/07 - 2021/08/09 (two images) will be utilized.

  • The general framework for a image analysis within QGIS can be summarized as follows:

    • Input project datasets

    • Composite bands to create banded products

    • Classify banded imagery utilizing an Supervised Classification regime.

    • Run Change detection to ascertain land cover difference/change across time.

    • Quantify change and produce cartographic output.

Part I - Step I - Data acquistion:

  • Both the Landsat and Sentinel programs can be accessed freely and easily through web interfaces, API scripting and a collection of tools (plugins) for QGIS. In this assignment, imagery will be acquired via a web interface. Two of the most prominent web interfaces are as follows:

  • NASA’s Earth Explorer - https://earthexplorer.usgs.gov/

Earth Explorer Interface

Sentinel Interface

Note: regardless of the interface or product, its important to keep in mind that the acquired imagery is required to be cloudless (or as cloudless as possible) across both time periods, within the analysis area. Any clouds present will obstruct accurate analysis where those clouds exist in the product. While there are several ways to correct for cloud coverage, in the following workflow clouds will simply negate accurate analysis where they exist. This is an acceptable condition as the area of concern (analysis area) is itself relatively cloud free. Where there are clouds, they will be highlighted in the assignment steps below.

  • For this assignment, utilize Earth Explorer. First, register and create a user login via the following link:

  • https://ers.cr.usgs.gov/register

  • Once complete, navigate to the Earth Explorer interface as a logged-on user. The first criteria will be location expressed as a polygon upload. In the C11 Assignment Data Backup - Vector Features, upload as shapefile the zipped folder entitled bounding_geometry.zip. This represents the bounding box for the vector feature lakebnds.shp which is the USGS feature for the known and authoritative extent of the Lake Mead Reservior. Make the first entry under Search Criteria in the Earth Explorer Tool Interface at left:

Upload bounding_geometry.zip to Earth Explorer

  • Next, select the data products under the Data Sets tab. The products are 1 Landsat 7 scene and 1 Landsat 8 scene. You will need to expland the catalog tree to get to the correct products:

Landsat 7 scene + 1 Landsat 8 Product Selection

  • Next, select the date range. Make the date range slightly larger than the actual range needed.

The first scene is as follows under the Landsat 7 ETM+ C2 L2 tab:

  • ID: LE07_L2SP_039035_20000807_20200918_02_T1

  • Date Acquired: 2000/08/07

  • Path: 039

  • Row: 035

  • Page Location: 143 of 151

Landsat 7 ETM+ C2 L2

The second scene under the Landsat 8 OLI/TIRS C2 LS is as follows:

  • ID: LC08_L2SP_039035_20210809_20210819_02_T1

  • Acquisition Date: 2021/08/09

  • Path: 039

  • Row: 035

  • Page Location: 2 of 59

Landsat 8 OLI/TIRS C2 LS

  • Next, click download options (small green downward icon) first for the Landsat 7 scene, followed exactly by the Landsat 8 scene:

download options

  • Under the download options, you will be downloading the full bundle accessed by the top button for both the Landsat 7 and Landsat 8 products. This will result in .tar compressed packages that you will place into your assignment working directory:

Product Bundle Download

Part I - Step II - Data Preparation:

  • Once downloaded, there are still several steps necessary to utilize the data effectively within QGIS. Place the downloaded data into a folder for the assignment. Place the two compressed directories into the folder (.tar file extensions). Save a QGIS project into the folder for the assignment directory that you make. Further, make several folders inside the project folder - these will hold outputs that are derived during the assignment. The following assignment directory structure is recommended:

Example Assignment Directory

  • c11 Assignment Vector Features

    • Bounding Geometry
    • US States
    • Lake Mead Bounds
  • classed

  • 2 .tar for the Landsat Scenes

  • processing

  • training

  • The two Landsat tile scenes are downloaded as .tar compressed folders. This type of compressor requires 7 Zip on Windows, or Keka on a Mac to uncompress. In Keka, extraction will produce an open directory ready to access. 7-zip, however, may produce an intermediate .tar compressed file. The .tar must then be extracted again to gain access to the root uncompressed data. In the end the directory image should look as follows:

Landsat Scenes Uncompressed in Directory

  • Opening LE07_L2SP_039035_20000807_20200918_02_T1 directory first will show the Landsat bands 1 - 7 that will be utilized in QGIS. Note that band 6 is skipped:

Import bands for the Landsat 7 Scene

Note: that the naming convention of the tile set will denote the product type and date, broken down as follows using the 2000 tile set as example:

LE07_L2SP_039035_20000807_20200918_02_T1

  • LE07 = Landsat 07

  • 039035 = path 039, row 035

  • 20000807 = year 2000, month 08, day 07

Note: thebands#.tif files - the files that hold the raster cell values - end differently than other sidecar files of the product. The sidecar files are helpful metadata for other analyses that can be done with Landsat imagery. For this assignment, the focus will be the bands themselves; the SR reference denotes Surface Reflectance values.

  • Import the raw bands for the 2000 Landsat 7 tile scene into the QGIS Layer Panel via the Open Data Source Manager. This gives an overview of the raster extent; however details in the imagery are indecipherable in the current segmented band format. In the next steps, this will be addressed to produce a banded product.

Part II - Step I - Build Virtual Banded Product for the Landsat 7 Scene:

  • There are several ways to created multiband, multispectral imagery in QGIS. In this assignment, we will rely on the Build Virtual Raster tool to perform this task as well as the supervised classification process.

  • Search for the tool in the processing toolbox after importing all bands into QGIS:

Bands Imported to Layers Panel

Search and Load Build Virtual Raster Tool

Populate the Tool; Make Sure Separate Bands is checked ON

Review Result Virtual

  • If the Virtual raster is successfully produced, this raster is now exported to the processing folder as 2000_composite`:

Export as GeoTIFF

Part II - Step II - Build Virtual Banded Product for the Landsat 8 Scene:

  • Repeat the same process as above, but for the LC08_L2SP_039035_20210809_20210819_02_T1 Landsat 8 Scene. Note that the band array includes Band 6 for Landsat 8:

Landsat 8 Bands

  • Build a Virtual Raster, followed by Exporting the GeoTIFF as 2021_composite to the processing folder.

Part III - Review Analysis Scope:

  • The two composites can now be compared; the extent difference of Lake Mead should be evident where the 2021 extent is significantly diminished from the baseline 2000 extent:

2000 and 2021 Composites

  1. In the 2000 image, there is a clustered cloud cover at the right on the image, as well as a small portion over the center of Lake Mead. Ideally these clouds would not exist. As they are relatively minor, we will proceed with the image, and accept that we cannot accurately classify water value pixels where there are clouds.

  2. Colors across both composites are not consistent. This will be corrected in the next step where the an appropriate band combination will be applied first to Landsat 7 2000, then Landsat 8 2021 scenes.

Part IV - Building the Bandset:

  • At this juncture, the individual bands for each analysis period have been combined into two separate multi-band rasters. Importantly, these multi-band products can now be visualized through band combinations in order to visually assess land cover patterns. There are many band combinations available, but for our purposes we want a combination that sets water off from land, creating a demarcation that will be helpful when training the imagery for water vs land classes.

Landsat 7 compared to Landsat 8 Band Combinations

  • To start, navigate to symbology properties of the first 2000 multi-band raster and set the bands to RGB > 4/5/3 as noted in the chart above. Repeat for the 2021 raster, but set the bands to RGB > 5/6/4:

Landsat 7 - 4/5/3 Combination

Landsat 8 - 5/6/4 Combination

  • Check the result. Save the QGIS project with just the two year GeoTIFFs loaded into the Layer Panel.

Note: Band combinations are a property of raster symbology. They can easily be reapplied as needed. Typically if the layers are saved with the combinations applied, the combinations will persist. However, if needed in the future, reapplication of a band combination is a quick, easy step as noted above.

Part V - Supervised Classification - Training Input:

  • With the band combinations complete and visualized to accentuate land cover differences, QGIS will be utilized to class pixels from continous values across the multi-band product into discrete, classes or ‘buckets’ representing land cover types. The classification algorithm must be trained to know which pixel combinations to place in one discrete class versus another discrete class. As much as this process is quantitative, there is an ‘art’ to classification for a supervised classification run. Here the analyst has jurisdiction over what pixel types will go into which land cover classes.

  • To Start, install the plugin - dzetsaka classification tool plugin:

dzetsaka classification tool plugin; the current stable release is 3.70 - March, 2021

  • Once installed, the main tool can be accessed from the Main Toolbar via the following icon:

dzetsaka Icon in Main Menu

  • Navigate to the processingfolder and load in the 2000_composite raster into the Layer Panel if it is not loaded currently. Make sure the 4/5/3 band combination is applied as it will be necessary for the classification steps below.

  • Next, a training shapefile will be created. This is a task that we have not done yet in this course; however, it not hard. Navigate Layer > Create Layer > New Shapefile layer:

New Shapefile Layer

The tool will be filled out with the following parameters:

  • Specify the output directory and set the name of shapefile ~training/2000_training.shp
  • Select the geometry type as shapefile: polygon
  • Select the coordinate system: Project CRS: EPSG:32621 - WGS 84/UTM zone 11N (this is the current project projection system set from both rasters. Landsat Scenes will always come projected with their appropriate UTM Zone. Analysis should proceed in this map projection.)
  • Create a new field for specifying class of the area Class, insert it into the Name field, and click Add to Fields List.

Add Class as Text Field; Save 2000_training.shp

  • An empty .shp is now placed into the Layer Panel. Toogle ON editing via the digitizing toolbar located in Main Menu:

Edit ON, Digitizing Toolbar

  • Next, click the Add Polygon icon (green blob icon) and zoom into the water feature in the center of the image. The first class that will be captured will be ID 1 for water.

Capturing the First Polygon for Water Features

Note: Make sure you are training the 2000 composite with the 2000 training, not the 2021 composite which will be done following the 2000 composite imagery.

  • Make sure the polygon captures only waterbody pixels. When complete, the polygon fills with an opaque color. If you make a mistake, you can remove a specific polygon from the training set at the attribute table.

Completed Training Polygon

If a Mistake is Made, Remove it from Attribute Table, then Save.

  • Continue drawing out several more waterbody training polygons for water areas that may have slightly lighter or darker pixel combinations. The goal here is to inform the classification algorithm to look for anything that is water, regardless if its lighter or darker…. but not any land features. Always place a 1 for the ID and water for the class. Make sure to capture lighter colored blues that are clearly water, but avoid capturing any shoreline pixels. Capture approximately 10 polygons of various sizes both in the center of the water body but also along shoreline areas. The goal is to get a representative sampling of all pixel combinations that represent the water extent of Lake Mead in the 2000 imagery. In the following image a polygon is created next to the shoreline, but not over the shoreline:

Classing near, but not on, Shoreline

  • Once finished with the water class, check the attribute table and make sure each polygon record has an ID of 1 and a class name of water. Next, move to the 2 class which is land. Here the precision of the polygons is not as important; the goal is not to break down land types, but to set of everything that is land as separate from water. Capture land areas that are close to the shoreline; we will mask the imagery later in the process, and we will only keep those areas that are either water or areas that are not water, but close to the shoreline.

Again, create approximately 10 polygons, but not more - 10 should be more than enough for the land class. Check the attribute table and the results:

Training Polygons Complete for 2000 Composite

Completed Training Attribute Table

Note: for the clouds in the far right of the 2000 composite image, class those white, light purple pixels as land, i.e. we will not include them as the primary water class. Also note that some of the polygons in the example image above are not particularly close to the shoreline; these are not the best training locations as we will mask the imagery before classification. A better approach is to keep all training in and close to the Lake Mead water extent.

  • Next, create another training shapefile, but for the 2021 composite. Use the same process above, but this time save in processing as 2021_training.shp. Replicate all the steps taken for the 2000 composite, but creating training polygons within the 2021_training.shp based on the 2021 composite imagery, NOT the 2000 composite imagery.

2021 Training Setup

Critical Note: 2010 training is for the 2000 composite; a separate training is done for the 2021 composite, named 2021_training.shp. Make sure to not mix up the trainings - one for 2000 and then one for 2021, based on the respective imagery.

  • When Complete, the result should be two complete trainings, and two composite raster images:

Training Complete

Part VI - Masking:

  • With the training complete, the classification can begin. However, we want the imagery to be masked as close the feature of interest - the Lake Mead water extent. As we are not interested in land cover outside the water extents, we can remove these geographies from classification through a masking process. This will dramatically speed up classification processing as the classification algorithm does not have to evaluate pixels outside the masked area.

  • To Start, input the lake_mead_bounding_box.shp into the project (unzip bounding_geometry.zip first):

Masking Geometry

  • Next, navigate to the masking tool in the processing toolbox:

Mask Raster - Processing Toolbox

  • Populate the tool for 2000 composite, then 2021 composite:

Mask Parameters - 2000

Mask Parameters - 2021

Save 2000_composite_masked.tif

Save 2021_composite_masked.tif

  • Check the masked results. Going forward, the masked rasters will be used as the input into the classification tools forthcoming.

Imagery Masked to Lake Mead Water Extent

Part VII - Classification:

Navigate to the dzetsaka classification tool and populate for the first run - 2000 imagery:

Dzetsaka Classification Tool - 2000 Imagery Population

  • Next, click on the gear wheel at right and set the classifier to K Nearest Neighbors. The classifcation will run on the id variable - 1 for water, 2 for land. The tool will run for upwards 20 minutes, but may be as quick as several minutes depending on the processing capacity of your machine. Once complete, it will place the classification results in to the layers panel.

Note: If you are unable to select the K Nearest Neigbhors method, or it asks for the scikit-learn python library, you have two choices:

  1. Stay with the default, built-in Gaussian Mixture Model. The upside is that you don’t have to do anything further, and this methods runs quickly. The downside is the resulting classification will likely be slightly less accurate to the water body extent than the K Nearest Neigbhors.
  1. Install the python library necessary for the K Nearest Neigbhors method to run within the dzetsaka tool. In the section scikit-learn install located at the end of the assignment, this extra installation step covers installation for both Windows and Linux/macOS.

You can symbolize as Paletted/Unique values, and then compare the classfication results to the original imagery. A good classification should clearly demarcate water from the shoreline, and land areas:

IPaletted/Unique values Applied

Classed Result

  • Save the QGIS project. Repeat all the steps that were just done for the 2000 masked imagery but now using the 2021, using the respective training set for 2021, NOT the 2000 training set. Once complete, export both classification results to the classed folder as 2000_classed.tif and 2021_classed.tif, respectively.

  • Clear the workspace and keep just the 2000_classed.tif and 2021_classed.tifrasters and save the updated project QGIS.

  • Next the difference between the two rasters will be calculated. There are several ways to approach the difference problem. As is, there are several values in each classed result - 1, 0 and 2 representing the original id values, and then also values along the edges of the image that contain neither values.

  • Possible difference combinations include the following:

    • 1 - 0 = 1 (water value 2000 - no value 2021)
    • 1 - 1 = 0 (no value 2000 - no value 2021)
    • 2 - 1 = 1 (land value 2000 - water value 2021)
    • 1 - 2 = -1 (water value 2000 - land value 2021)
    • 0 - 0 = 1 (no value 2000 - no value 2000)
    • 2 - 0 = 2 (land value 2000 - no value 2021)

We are interested in the condition -1 where there was water in 2000 but became land in 2021. This represents the diminishment of the Lake Mead water extent over the past two decades, largely due to climate change dynamics that have severely stressed this water resource.

Before running a difference function, zoom in and compare your classification results. Your 2021 classed raster should show a water extent that is definitely diminished from the 2000 classed raster:

Classed Differences Across 2000 > 2021

  • Open the raster calculator and form the following equation, exporting the raster result as 2000_2021_difference:

    • "2000_classed@1" - "2021_classed@1"

Save 2000_2021_difference to classed folder

-Review the result. Note the 1 value; this is the cloud condition over the water extent. We will disregard this class. The focus for the further analysis will be the -1 value:

Classed Difference Result

  • To finish the classification and change detection, the -1 value will be isolated by setting all export values except -1 to no data. Export the 2000_2021_difference to difference_final in the classed folder, and populate the no data parameters as follows:

Exclude Values from Final Result

  • Use the value tool (we’ve used this tool in past assignments) to preview the binary result of no data vs the -1 value:

Final Classification Values for Water Extent Difference

Part VIII - Final Cartography:

  • Organize a new workspace and save c.11.assignment.cartography with the following layers broken into two layer groups - main.map.frame and inset.frame:

  • difference_final

  • ESRI Satellite base or another base layer accessed from the QuickMapServices plugin

  • lake_mead_boundary_box

  • cb 2018 us states

Final Cartography Organization

  • At this juncture, most of the elements needed for the final map layout are ready. In the concluding section below, statistics are developed so that deforestation area and percentage can be included in the final map design. The final map design should include the following:

    • Landscape or portrait orientation map sheet (your choice)

    • Main map frame featuring the difference pixels atop quickmapservices basemap imagery.

    • An inset map denoting location of the project area.

    • Titling that explains the location, date range and thematic purpose of the map.

    • Simple legend for the difference class (-1).

    • Source the data as follows: Landsat-8 and Landsat-7 analysis imagery courtesy of the U.S. Geological Survey

  • See the following layout example for basic design positioning suggestions:

Assignment 11 Draft Layout

  • Assignment 11 Draft Layout PDF

  • In the example above, the source tag has not been included. The projection for the inset map has been altered to preserve a horizontal orientation for the inset map. The main map frame utilizes the UTM Zone 11N for the scale bar. The lake mead bounding geometry was used explicitly for the inset map to show the main map frame extent. Remember, we can avoid this by using the Overview insert capacity for map inset in Map Layout.

Part IX - Statistics for Map Layout:

  • As detailed in the section above, the map product will feature the water surface loss in the map layout. Of course this is helpful to conveying the spatial distribution of main map theme - water loss. However, we don’t have a quantitative areal or percentage expression. This can be achieved easily, and incorporated into the final map design as a text explainer or developed within the legend.

  • Save the c.11.assignment.cartography QGIS project and open as new project and import just the classed layers:

  • difference_final

  • 2021_classed

  • 2000_classed

Classed Raster Layers

  • Navigate to Processing Toolbox and search report:

Raster Report

  • First, place the 2000_classed raster into the tool parameter, and run as a temporary report. This will produce an html report in the lower right of the Map Canvas window.

  • This will produce a report.html document which can be accessed and seen as follows:

report.html

report.html

This tells us the total square area (squared meters) of the 1 value in 2000 is 549405900 m² or 212 mi².

To express units as square miles, the following equation can be utilized:

  • mi² =m² * 0.00000038610

Next, run the same tool process on the 2021_classed raster. In the example, the 1 value is found to be 303937200 m² or 117 mi². The difference between the 1 value 2000_classed - 2021_classed will be similar as -1 in the difference_final, accounting for the condition where clouds that were present over water in the 2000_classed layer where removed from consideration as a water value. This discrepancy should amount to a small fraction of the total extent in 2000. In the example classification, it accounts for 711000 m², or 0.27 mi².

In the end, the difference_final extent is found to be 246179700 m² or 95 mi². This is the amount of water surface extent that has been displaced since 2000.

The loss of 303226200 m² or, again, 117 mi², constitutes a 44% percent loss in surface extent from 2000 to 2021, based on the following formula:

  • Original Value(water 2000) - Final Value(water 2021) / Original Value(water 2000) * 100

  • For the final cartographic design, consider incorporating your statistics into the final map layout to help contextualize the thematic purpose of the map - surface water extent loss 2000 - 2021 at the Lake Mead Reservoir.

scikit-learn Python Package Installation:

As part of the QGIS install, a Python installation also occurs in the background, and a series of Python Libraries are also installed. However, scikit-learn is not one of them. This Python library features a series of cluster algorithms, including k-Means. When the dzetsaka tool calls on the K Nearest Neigbhors method, it relies on this library. To start, the installation for a Windows environment is shown, followed by a Linux/macOS environment. The steps below are also shown in the guidance for the dzetsaka tool.

  • Windows Install:

  • For Windows, the first find and navigate to OsGeo shell which is a command line environment for the QGIS install:

Search for the OsGeo shell within the Windows Start Search

  • Once open, simply type o4w_env to set up the environment:

OsGeo shell

  • Next, pip is called to install the scikit-learn package:

python3 -m pip install scikit-learn -U --user

Type of the Command for the Pip Install

  • Once complete, you should see the package install and a ‘successful install’ note. Depending on your pip install, you may be reminded to upgrade to a newer version; you don’t need to do that if the scikit-learn package is successful:

Install Log within the Shell

  • Restart your machine, open QGIS and navigate back to your project and the K Nearest Neigbhors method should now run without complication.

  • Linux/macOS

  • For Linux/MacOS, the first find and navigate to terminal on your machine which should be located in Utilities:

macOS Utilities

Terminal Icon

  • Next, open the terminal and enter the following command:

python3 -m pip install scikit-learn -U --user

Install scikit-learn via Command Line

  • A successful install will give notice as follows:

Sucessful Install

  • Restart your machine, open QGIS and navigate back to your project and the K Nearest Neigbhors method should now run without complication.

Further Reference - Sentinel: